Lecture 11 - Multiple Regression
======= <<<<<<<< HEAD:docs/lectures/lecture_13/13_01_lecture_powerpoint_html.htmlLecture 13 - Multifactor ANOVA
========Lecture 11 - Multiple Regression
>>>>>>>> origin/main:docs/lectures/lecture_11/11_01_lecture_powerpoint_html.html >>>>>>> origin/mainLecture 12: Review
ANOVA
- Analysis of variance: single and multi-factor designs
- Examples: diatoms, circadian rhythms
- Predictor variables: fixed vs. random
- ANOVA model
- Analysis and partitioning of variance
- Null hypothesis
- Assumptions and diagnostics
- Post F Tests - Tukey and others
- Reporting the results
Lecture 13: Multifactor ANOVA Overview
Multifactor ANOVA
- nested and factorial designs
Nested design examples
- Factorial design examples
- Nested designs
- Linear model
- Analysis of variance
- Null hypotheses
- Unbalanced designs
- Assumptions
========
>>>>>>> origin/main
- Regression T-Test Anova
- Regression Assumptions
- Model II Regression
- Regression parameters
- Analysis of variance
- Null hypotheses
- Explained variance
- Assumptions and diagnostics
- Collinearity
- Interactions
- Dummy variables
- Model selection
- Importance of predictors
- If predictors continuous
- Mix between categorical and continuous
- Can use multiple linear regression <<<<<<< HEAD
- reduce unexplained variance
- look at interactions
- Can have more factors (e.g., 3-way ANOVA)
- interpretation tricky…
- Nested/hierarchical: levels of B occur only in 1 level of A
- Factorial/crossed: every level of B in every level of A
- Factor A usually fixed
- Factor B usually random
- Both factors typically fixed (but not always)
- 2 enclosure sizes (factor A)
- 5 replicate enclosures (factor B)
- 5 replicate limpets per enclosure
- 4 levels of urchin grazing: none, L, M, H
- 4 patches of rocky bottom (3-4 m2) nested in each level of grazing
- 5 replicate quadrats per patch
- 3 light levels (factor A)
- 3 size classes (factor B)
- 5 replicate seeding in each cell
- 2 food levels (factor A)
- presence/absence of tadpoles (factor B)
- 8 replicates in each cell
- 2 seasons (factor A)
- 4 density treatments (factor B)
- 3 replicates in each cell
- p levels of factor A (i= 1…p) (e.g., 4 grazing levels)
- q levels of factor B (j= 1…q), nested within each level of A (e.g., 4 - diff. patches per grazing level)
- n replicates (k= 1…n) in each combination of A and B (5 replicate - quadrats in each patch in each grazing level)
- overall mean (across all levels of A and B)= ȳ;
- a mean for each level of A (across all levels of B in that A)= ȳi;
- a mean for each level of B within each A= ȳj(i)
\(y_{ijk}\) is the response variable
value of the k-th replicate in j-th level of B in the i-th level of A
(algal biomass in 3rd quadrat, in 2nd patch in low grazing treatment)
\(\mu\) is the overall mean
- (overall average algal biomass)
\(\alpha_i\) is the fixed effect of factor \(i\)
(difference between average biomass in all low grazing level quadrats and overall mean)
\(\beta_{j(i)}\) is the random effect of factor \(j\) nested within factor \(i\)
usually random variable, measuring variance among all possible levels of B within each level of A
(variance among all possible patches that may have been used in the low grazing treatment)
- \(\varepsilon_{ijk}\) is the error term
- αi: is the effect of the ith level of A: µi- µ
- unexplained variance associated with the kth replicate in jth level of B in the ith level of A
- (difference bw observed algal biomass in 3rd quadrat in 2nd patch in low grazing treatment and predicted biomass - average biomass in 2nd patch in low grazing treatment)
- SSresid is difference bw each observation and mean for its level of factor B, summed over all observations
- SStotal = SSA + SSB + SSresid
- SS can be turned into MS by dividing by appropriate df
- no effects of factor A
- Assuming A is fixed:
- Ho(A): µ1= µ2= µ3=…. µi= µ
- Same as in 1-factor ANOVA, using means from B factors nested within each - level of A
- (no difference in algal biomass across all levels of grazing: µnone= - µlow= µmed= µhigh)
- No effects of factor B nested in A
- Assuming B is random:
- Ho(B): σβ2= 0 (no variance added due to differences between all possible - levels of B)
- (no variance added due to differences between patches)
- uneven number of B levels within each A
- uneven number of replicates within each level of B
- equal variance
- normality
- independence of observations
- Since means for each level of B within each A are used for the H-test about A, need to assess whether those means meet normality and equal variance
- Examine residuals for H-test about B
- Transformations can be used
- latitude
- longitude
- both
- Describe nature of relationship between Y and X’s
- Determine explained / unexplained variation in Y
- Predict new Ys from X
- Find the “best” model
- Overfitting
- Parameter proliferation
- Multicollinearity
- Model selection
- Set of i= 1 to n observations
- fixed X-values for p predictor variables (X1, X2…Xp)
- random Y-values:
yi: value of Y for ith observation X1 = xi1, X2 = xi2,…, Xp = xip
β0: population intercept, the mean value of Y when X1 = 0, X2 = 0,…, Xp = 0
β1: partial population slope, change in Y per unit change in X1 holding other X-vars constant
β2: partial population slope, change in Y per unit change in X2 holding other X-vars constant
βp: partial population slope, change in Y per unit change in Xp holding other X-vars constant
εi: unexplained error - difference bw yi and value predicted by model (ŷi)
NPP = β0 + β1(lat) + β2 (long) + β3 (soil fertility) + εi
- Estimate multiple regression parameters (intercept, partial slopes) using OLS to fit the regression line
- OLS minimize ∑(yi-ŷi)2, the SS (vertical distance) between observed yi and predicted ŷi for each xij
- ε estimated as residuals: εi = yi-ŷi
- Calculation solves set of simultaneous normal equations with matrix algebra
Confidence intervals calculated for parameters
Confidence and prediction intervals depend on number of observations and number of predictors
- More observations decrease interval width
- More predictors increase interval width
Prediction should be restricted to within range of X variables
SSregression is variance in Y explained by model
SSresidual is variance not explained by model
- MSresidual: estimate population variance
- MSregression: estimate population variance + variation due to strength of X-Y relationships
- MS do not depend on sample size
- “Basic” Ho: all partial regression slopes equal 0; β1 = β2 = … = βp = 0
- If “basic” Ho true, MSregression and MSresidual estimate variance and their ratio (F-ratio) = 1
- If “basic” Ho false (at least one β ≠ 0) MSregression estimates variance + partial regression slope and their ratio (F-ratio)
- will be > 1 - F-ratio compared to F-distribution for p-value
- E.g., does LAT have effect on NPP?
- These Hs tested through model comparison
- Model w 3 predictors X1, X2,X3 (model 1):
- yi= β0 +β1xi1+β2xi2+β3xi3+ εi
- To test Ho that β1 = 0 compare fit of model 1 to model 2:
- yi= β0 +β2xi2+β3xi3+ εi
- If SSregression of mod1=mod2, cannot reject Ho β1 = 0
- If SSregression of mod1 > mod2, evidence to reject Ho β1 = 0
- SS for β1 is SSextraβ1 = Full SSregression - Reduced SSregression
- Use partial F-test to test Ho β1 = 0 :
- r2 values can not be used to directly compare models
- r2 values will always increase as predictors added
- r2 values with different transformation will differ
- Assume fixed Xs; unrealistic in most biological settings
- No major (influential) outliers
- Check leverage, influence- Cook’s Di
- Normality, equal variance, independence
- Residual QQ-plots, residuals vs. predicted values plot
- Distribution/variance often corrected by transforming Y
- Ideally at least 10x observations than predictors to avoid “overfitting”
- Uncorrelated predictor variables (assessed using scatterplot matrix; VIFs)
- Linear relationship between Y and each X, holding others constant (non-linearity assessed by AV plots)
- Potential predictor variables are often correlated (e.g., morphometrics, nutrients, climatic parameters)
- Multicollinearity (strong correlation between predictors) causes problems for parameter estimates
- Severe collinearity causes unstable parameter estimates: small change in a single value can result in large changes in βp - estimates
- Inflates partial slope error estimates, loss of power
Variance inflation Factors:
- VIF for Xj=1/ (1-r2 )
- VIF > 10 = bad
Best/simplest solution:
- exclude variables that are highly correlated with other variables
- they are probably measuring similar
- thing and are redundant
- additive (effect of temp, plus precip, plus fertility) or
- multiplicative (interactive)
- Interaction: effect of Xi depends on levels of Xj
- The partial slope of Y vs. X1 is different for different levels of X2 (and vice versa); measured by β3
- many more predictors (“parameter proliferation”):
- 2n; 6 params= 64 terms; 7 params= 128
- interpretation more complex
- When to include interactions? When they make biological sense
- Need 1 dummy var with two values (0, 1)
- Need 2 dummy var, each with two values (0, 1): fert1 (0 if L or H, 1 if M), fert2 (1 if H, 0 if L or M)
- R codes dummy variables automatically
- picks “reference” level alphabetically
- Dummy variables with more than 2 levels add extra predictor variables to model
- how to choose “best” model?
- Which predictors to include?
- Occam’s razor: “best” model balances complexity with fit to data
- compare “nested” models
- getting high r2 just by having more (useless predictors)
- so r2 is not a good way of choosing between nested models
- Adjusted r2
- Akaike’s information criterion (AIC)
- Both “penalize” models for extra predictors
- Higher adjusted r2 and lower AIC are better when comparing models
Can fit all possible models
- compare AICs or adj- r2,
- tedious w lots of predictors
Automated forward (and backward) stepwise procedures: start w no terms (all terms), add (remove) terms w largest (smallest)
- partial F statistic
- Three general approaches:
- Using F-tests (or t-tests) on partial regression slopes
- Using coefficient of partial determination
- Using standardized partial regression slopes
- Conduct F tests of Ho that each partial regression slope = 0
- If cannot reject Ho, discard predictor
- Can get additional clues from relative size of F-values
- Does not tell us absolute importance of predictor (usually can not directly compare slope parameters)
- the reduction in variation of Y due to addition of predictor (Xj)
Increased in SSregression when Xj is added to model
Reduced SSresidual is the unexplained SS from model without Xj
- predictors of predictor variables can not be directly compared
- Why?
- Standardize all vars (mean = 0, sd= 1)
- Scales are identical and larger PRS mean more important variable
Lecture 10: Review
Covered
<<<<<<< HEADRegression T-Test Anova
Regression Assumptions
Model II Regression
>>>>>>> origin/mainLecture 11: Overview
Multiple Linear Regression model
Lecture 11: Analyses
What if more than one predictor (X) variable?
Lecture 13: 2 Factor or 2 Way ANOVA
Often consider more than 1 factor (independent categorical variable):
2-factor designs (2-way ANOVA) very common in ecology
Most multifactor designs: nested or factorial
Lecture 13: Nested and factorial designs
Consider two factors: A and B
Lecture 13: Nested and factorial designs
Nested Designs:
Lecture 13: Nested and factorial designs
Factorial Designs:
Lecture 13: Nested designs: examples
Study on effects of enclosure size on limpet growth:
Lecture 13: Nested designs: examples
Study on reef fish recruitment: 5 sites (factor A) 6 transects at each site (factor B) replicate observations along each transect
Lecture 13: Nested designs: examples
Effects of sea urchin grazing on biomass of filamentous algae:
F
Lecture 13: Factorial designs: examples
Effects of light level on growth of seedlings of different size:
Lecture 13: Factorial designs: examples
Effects of food level and tadpole presence on larval salamander growth
Lecture 13: Factorial designs: examples
Effect of season and density on limpet fecundity.
F
Lecture 13: Nested designs: linear model
Consider a nested design with:
I
Lecture 13: Nested designs: linear model
Can calculate several means:
Lecture 13: Nested designs: linear model
Lecture 13: Nested designs: linear model
The linear model for a nested design is: \[y_{ijk} = \mu + \alpha_i + \beta_{j(i)} + \varepsilon_{ijk}\]
Where:
Lecture 13: Nested designs: linear model
The linear model for a nested design is:
The linear model for a nested design is: \[y_{ijk} = \mu + \alpha_i + \beta_{j(i)} + \varepsilon_{ijk}\]
Lecture 13: Nested designs: linear model
The linear model for a nested design is:
The linear model for a nested design is: \[y_{ijk} = \mu + \alpha_i + \beta_{j(i)} + \varepsilon_{ijk}\]
Lecture 13: Nested designs: analysis of variance
As before, partition the variance in the response variable using SS SSA is SS of differences between means in each level of A and overall mean
Lecture 13: Multifactor ANOVA
SSB is SS of difference between means in each level of B and the mean of corresponding level of A summed across levels of A
Lecture 13: Nested designs: analysis of variance
Lecture 13: Nested designs: analysis of variance
Lecture 13: Nested designs: null hypotheses
Two hypotheses tested on values of MS:
Lecture 13: Nested designs: null hypotheses
Two hypotheses tested on values of MS:
Lecture 13: Nested designs: null hypotheses
Conclusions?
“significant variation between replicate patches within each treatment, but no significant difference in amount of filamentous algae between treatments”
Lecture 13: Nested designs: unbalanced designs
Unequal sample sizes can be because of:
Not a problem, unless have unequal variance or large deviation from - normality
Lecture 13: Nested designs: assumptions
As usual, we assume
Equal variance + normality need to be assessed at both levels:
| Independent variable | ||
|---|---|---|
| Dependent variable | Continuous | Categorical |
| Continuous | Regression | ANOVA |
| Categorical | Logistic regression | Tabular |
Lecture 11: Analyses
Abundance of C3 grasses can be modeled as function of
Instead of line, modeled with (hyper)plane
Lecture 11: Analyses
Used in similar way to simple linear regression:
S
Lecture 11: Analyses
Crawley 2012: “Multiple regression models provide some of the most profound challenges faced by the analyst”:
Lecture 11: Analyses
Multiple Regression:
\[y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + ... + \beta_p x_{ip} + \epsilon_i\]
Lecture 11: Multiple linear regression model
Multiple Regression:
\[y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + ... + \beta_p x_{ip} + \epsilon_i\]
Lecture 11: Regression parameters
Multiple Regression:
\[y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + ... + \beta_p x_{ip} + \epsilon_i\]
Lecture 11: Regression parameters
Multiple Regression:
\[y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + ... + \beta_p x_{ip} + \epsilon_i\]
Lecture 11: Regression parameters
Regression equation can be used for prediction by subbing new values for predictor (X) variables
Lecture 11: Analyses of variance
Variance - SStotal partitioned into SSregression and SSresidual
| Source of variation | SS | df | MS | Interpretation |
|---|---|---|---|---|
| Regression | \(\sum_{i=1}^{n} (y_i - \bar{y})^2\) | \(p\) | \(\frac{\sum_{i=1}^{n} (y_i - \bar{y})^2}{p}\) | Difference between predicted observation and mean |
| Residual | \(\sum_{i=1}^{n} (y_i - \hat{y}_i)^2\) | \(n-p-1\) | \(\frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{n-p-1}\) | Difference between each observation and predicted |
| Total | \(\sum_{i=1}^{n} (y_i - \bar{y})^2\) | \(n-1\) | Difference between each observation and mean |
Lecture 11: Analyses
SS converted to non-additive MS (SS/df)
| Source of variation | SS | df | MS |
|---|---|---|---|
| Regression | \(\sum_{i=1}^{n} (y_i - \bar{y})^2\) | \(p\) | \(\frac{\sum_{i=1}^{n} (y_i - \bar{y})^2}{p}\) |
| Residual | \(\sum_{i=1}^{n} (y_i - \hat{y}_i)^2\) | \(n-p-1\) | \(\frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{n-p-1}\) |
| Total | \(\sum_{i=1}^{n} (y_i - \bar{y})^2\) | \(n-1\) |
Lecture 11: Hypotheses
Two Hos usually tested in MLR:
Lecture 11: Hypotheses
Also: is any specific β = 0 (explanatory role)?
Lecture 11: Hypotheses
\[F_{w,n-p} = \frac{MS_{Extra}}{FULL\ MS_{Residual}} \] Can also use t-test (R provides this value)
Lecture 11: Explained variance
Explained variance (r2) is calculated the same way as for simple regression:
\[r^2 = \frac{SS_{Regression}}{SS_{Total}} = 1 - \frac{SS_{Residual}}{SS_{Total}} \]
Lecture 11: Assumptions and diagnostics
Lecture 11: Assumptions and diagnostics
Lecture 11: Assumptions and diagnostics
More observations than predictor variables
Lecture 11: Analyses
Regression of Y vs. each X does not consider effect of other predictors:
want to know shape of relationship while holding other predictors constant
::::
>>>>>>> origin/mainLecture 11: Collinearity
Lecture 11: Collinearity
Collinearity can be detected by:
Lecture 11: Interactions
Predictors can be modeled as:
\[y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + \epsilon_i \quad \text{vs.} \quad y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + + \beta_3X_{i3} \epsilon_i\]
“Curvature” of the regression (hyper)plane
Lecture 11: Analyses
Lecture 11: Analyses
Adding interactions:
Lecture 11: Dummy variables
Multiple Linear Regression accommodates continuous and categorical variables (gender, vegetation type, etc.) Categorical vars as “dummy vars”, n of dummy variables = n-1 categories
Sex M/F:
Fertility L/M/H:
| Fertility | fert1 | fert2 |
|---|---|---|
| Low | 0 | 0 |
| Med | 1 | 0 |
| High | 0 | 1 |
Lecture 11: Analyses
Coefficients interpreted relative to reference condition
| Fertility | fert1 | fert2 |
|---|---|---|
| Low | 0 | 0 |
| Med | 1 | 0 |
| High | 0 | 1 |
Lecture 11: Analyses
S
Lecture 11: Comparing models
When have multiple predictors (and interactions!)
To chose:
Overfitting
Lecture 11: Comparing models
Need to account for increase in fit with added predictors:
\[\text{Adjusted } r^2 = 1 - \frac{SS_{\text{Residual}}/(n - (p + 1))}{SS_{\text{Total}}/(n - 1)}\] \[\text{Akaike Information Criterion (AIC)} = n[\ln(SS_{\text{Residual}})] + 2(p + 1) - n\ln(n)\]
Lecture 11: Comparing models
But how to compare models?
We will use manual form of backward selection
Lecture 11: Analyses
Lecture 11: Predictors
Usually want to know relative importance of predictors to explaining Y
Lecture 11: Predictors
Using F-tests (or t-tests) on partial regression slopes:
Lecture 11: Predictors
Using coefficient of partial determination:
\[r_{X_j}^2 = \frac{SS_{\text{Extra}}}{\text{Reduced }SS_{\text{Residual}}}\]
SSextra
Lecture 11: Predictors
Using standardized partial regression slopes:
Lecture 11: Predictors
Using partial r2 values:
Lecture 11: Reporting results
Results are easiest to report in tabular format
Lecture 11: Reporting results
Results are easiest to report in tabular format